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Abstract 



Decentralized multiple access channels where each transmitter wants to selfishly maximize his transmission 

energy-efficiency are considered. Transmitters are assumed to choose freely their power control policy and interact 

1-^ (through multiuser interference) several times. It is shown that the corresponding conflict of interest can have a 

"i predictable outcome, namely a finitely or discounted repeated game equilibrium. Remarkably, it is shown that this 

jrt equilibrium is Pareto-efficient under reasonable sufficient conditions and the corresponding decentralized power control 

^ policies can be implemented under realistic information assumptions: only individual channel state information and a 

public signal are required to implement the equilibrium strategies. Explicit equilibrium conditions are derived in terms 

^ of minimum number of game stages or maximum discount factor Both analytical and simulation results are provided 

(^ to compare the performance of the proposed power control policies with those akeady existing and exploiting the 



same information assumptions namely, those derived for the one-shot and Stackelberg games. 

Index Terms 

Cognitive radio, energy-efficiency. Folk theorem, Nash equilibrium, power control games, repeated games. 



X 

c^ I. Introduction 

Many current wireless communications systems (e.g., cellular networks) are optimized in terms of quality 
of service (QoS), which can include for example, performance criteria such as transmission rate, reliabihty, 
latency, or security. It turns out that applications where trade-offs have to be found between QoS and energy 
consumptions, have become more and more important, especially over the past decade. Wireless sensor 
networks and ad- hoc networks are two good examples illustrating the importance of finding such trade-offs. 
A very simple and pragmatic way of knowing to what extent a communication is energy-efficient has been 
proposed by [[Il[l2l|. The authors of [|T10 define energy-efficiency as the net number of information bits 
that are transmitted without error per unit time (goodput) to the transmit power level. More specifically, the 
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authors analyze the problem of distributed power control (PC) in flat fading multiple access channels (MACs). 
The problem is formulated as a non-cooperative game where the players are the transmitters, the action of a 
given player is her/his/its transmit power ("his" is chosen in this paper), and his payoff/reward/utility function 
is the energy-efficiency of his communication with the receiver. The results reported in [|T]|[|2|| have been 
extended to the case of multi-carrier systems in [3J. Unfortunately, as shown in [2], Nash equilibria (NE) 
resulting from the one- shot game formulation of the energy-efficient power control problem are generally 
inefficient. This is one of the reasons why some authors proposed to apply other game-theoretic concepts 
to improve efficiency of the network equilibrium: [4] proposed a pricing mechanism and flSl proposed 
to introduce some hierarchy in the network by considering a Stackelberg game [6J or using successive 
interference cancellation at the receiver. The solution of [4] has the advantage to be Pareto-optimal (PO) 
but requires global channel state information (CSI) at the transmitters and equilibrium uniqueness is not 
proven analytically. On the other hand, the solution of [Si is not PO but only requires individual CSI and 
uniqueness is guaranteed. 

All the cited and related works on energy-efficient PC (EUlflSl, etc) have at least one common point: 
time is divided into windows or blocks over which the channel is assumed to be constant and transmit 
power levels can be updated only once within a given block. The corresponding framework is the one of 
static or one-shot games which is to say, transmitters play independently from block to block and maximize 
their instantaneous utility for each block. In this paper, we consider a more general situation: transmitters 
are allowed to update their power levels several times within a block; the corresponding PC type could be 
called decentralized fast PC (DFPC), generalizing the more conventional decentralized slow PC (DSPC) for 
which the power can be updated only once per block. Both in the DFPC and DSPC cases, we want to take 
into account the fact that players (namely the transmitters) interact several times within a block or/and from 
block to block, which introduces new types of behaviors (cooperation, punishment, etc) with respect to the 
one-shot game. The framework considered here is the one of dynamic games. More specifically, we analyze 
a special case of dynamic games, which is the case of repeated games (RG). In standard repeated games 
[EllllHIlS the same game is played a finite or infinite number of times and players are interested in optimizing 
a certain performance metric, resulting from averaging their utility over the whole duration of the game. In 
contrast with iterative or learning techniques that are based on mild information assumptions and different 
behavior assumptions (transmitters can be modeled by automata), RG generally require more demanding 
information and behavior assumptions. Also, RG aim at optimizing an averaged utility and reaching points 
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more efficient tiian tlie one-sliot game NE. Two important models of RG are considered: the finitely repeated 
game (FRG) [lO'l and discounted repeated game (DRG) [11]. A priori, the FRG seems to be more suited to 
DFPC since the number of times players interact is finite and can be known (e.g., the number of training 
symbols in a block) whereas the DRG seems to be more suited to DSPC with a uncertain number of blocks 
over which players interact. To the authors' knowledge, there is only a small fraction of papers dedicated 
to repeated games in the wireless literature. As far as the present paper is concerned, the most relevant 
contributions available are [fT2l|[|T3l[fT4ll . With respect to these works, our contributions are as follows. The 
work reported in this paper is the first to apply the concept of RG to energy-efficient PC (the existing 
works consider Shannon transmission rates or similar utility functions). A second important feature of the 
present analysis is that only individual CSI and signal-to-interference plus noise ratio (SINR) are needed to 
implement the proposed PC scheme, which is not the case in [fT2l [fT3l [fT4l . Third, two models of RG are 
considered and explicit equilibrium conditions on the number of game stages (for the FRG) and discounted 
factor (for the DRG) are provided and discussed, which is not made in the existing literature. At last but 
not least, the PC policy we propose is compared in a fair manner with existing game-theoretic PC policies 
both analytically and by simulations and shown to be the most efficient one (in the sense of Pareto). For 
this purpose, several works from the game theory literature, and not used yet by the wireless community, 
are exploited. 



This paper is structured as follows. In Sec. II-A the assumed signal model is described. This is followed 



(Sec. II-B[ ) by a short review of the static/one-shot non-cooperative and Stackelberg PC games. In Sec. 



nil ^ rigorous RG formulation of the PC problem is provided and information assumptions necessary to 



implement the equilibrium strategies of Sec. |IV-B| are given. The proposed equilibria (finitely and discounted 
repeated games equilibria) and their properties are analyzed in Sec. |IVl The results derived are illustrated 
by simulations in Sec. |V} which is followed by the conclusion (Sec. VI). 

II. System model 

A. Signal model 

We consider a decentralized MAC with a finite number of transmitters, which is denoted by K. The 
network is said to be decentralized in the sense that the receiver (e.g., a base station) does not dictate 
to the transmitters (e.g., mobile stations) their PC policy. Rather, all the transmitters choose their policy 
by themselves and want to selfishly maximize their energy-efficiency; in particular, they can ignore some 
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specified centralized policies. We assume that the users transmit their data over quasi-static channels and at 
the same time and frequency band. Note that a block is defined as a sequence of M consecutive symbols 
which comprises a training sequence that is, a certain number of consecutive symbols used to estimate the 
channel (or other related quantities) associated with a given block. A block has therefore a duration less 
than the channel coherence time. The signal model used corresponds to the information-theoretic channel 
model used for studying MAC [[T5l[fT6l : see e.g., ifTTl for more comments on the multiple access technique 
involved. What matters is that this model is both simple to be presented and captures the different aspects 
of the problem (the SINR structure in particular) and can be readily applied to speficic systems such as 
CDMA systems ElJSl or multi-carrier CDMA systems (31 . The equivalent baseband signal received by the 
base station can be written as 

K 

yi.'n) = X^ gi{n)xi{n) + z{n) (1) 

where i G /C, /C = {1, ..., K}, Xi{n) represents the symbol transmitted by transmitter i at time n, E|xjp = pi, 
the noise z is assumed to be distributed according to a zero-mean Gaussian random variable with variance cr^ 
and each channel gain gi varies over time but is assumed to be constant over each block. For each transmitter 
i, the channel gain modulus is assumed to lie in a compact set l^fjl G [^™",^™^''] . This assumption is both 
practical (for example, such limitations can model the finite receiver sensitivity and the existence of a 
minimum distance between the transmitter and receiver) and important to guarantee the existence of the 



proposed equilibria (Sec. IV). At last, the receiver is assumed to implement single-user decoding. 



B. Review of the one-shot power control game 

Here, we review a few key results from [i2l concerning the static non-cooperative PC game. In order 
to define the static PC game some notations need to be introduced. We denote by Ri the transmission 
information rate (in bps) for user i and / an efficiency function representing the block success rate, which 
is assumed to be sigmoidal and identical for all the users; the sigmoidness assumption is a reasonable 
assumption, which is well justified in llT8llllT9ll . Recently, [20J has shown that this assumption is also justified 
from an information-theoretic standpoint. At a given instant, the SINR at receiver i E IC writes as: 

where pi is the power level for transmitter i. With these notations, the static PC game, called Q, is defined 
in its normal form as follows. 
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Definition 1 (Static PC game): The static PC game is a triplet Q = (/C, {Ai}i(z!c, {wijieK:) where K, is 
the set of players, Ai, ...,Ak are the corresponding sets of actions, Ai = [0, P™^"^], P^^^ is the maximum 
transmit power for player i, and Ui, ...,Uk are the utilities of the different players which are defined by: 

Uiipi, ...,pk) = -^^ [bit/J]. (3) 

Pi 

In this game with complete information (Q is known to every player) and rational players (every player does 
the best for himself and knows the others do so and so on), an important game solution concept is the NE 
(i.e., a point from which no player has interest in unilaterally deviating). When it exists, the non-saturated 
NE of this game can by obtained by setting |^ to zero, which gives an equivalent condition on the SINR: 
the best SINR in terms of energy-efficiency for transmitter i has to be a solution of xf'{x) — /(x) = 
(this solution is independent of the player index since a common efficiency function is assumed, see [fT9ll 
for more details). This leads to: 

where (3* is the unique solution of the equation xf'{x) — f{x) = 0. By using the term "non- saturated NE" 
we mean that the maximum transmit power for each user, denoted by P™^'', is assumed to be sufficiently 
high not to be reached at the equilibrium i.e., each user maximizes his energy-efficiency for a value less 
than p™^=^ (see (H for more details). An important property of the NE given by (|4]) is that transmitters only 
need to know their individual channel gain \gi\ to play their equilibrium strategy. One of the interesting 
results of this paper is that it is possible to obtain a more efficient equilibrium point by repeating the game 
Q while keeping this key property. 

C. Review of the Stackelberg power control game 

Here we review a few key results from [jSl. The corresponding results will be used in Sec. IV and 
V where the performance of the proposed RG equilibrium is compared with the one of the Stackelberg 
equilibrium (SE). The main difference between the one-shot game [2] and the Stackelberg game is that 
in the latter, one of the transmitters is the leader of the game. The other transmitters are assumed to be 
able to observe the actions of the leader and react to them accordingly; these transmitters are called the 
followers. Note that the leader knows that his actions are observed and therefore anticipates the reaction 
of the followers. In [5|, it is shown that the SE power profile Pareto-dominates the one- shot NE power 
profile. The power played by transmitter z G /C at the SE when he is the game leader (L) is given by: 
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pf = j^ i-(K~i)ri3''^-iK-2)i3' ^here 7* is the unique solution of x 



-^ l-{_ft'-2),3*-^ 



f{x) - fix) = 0, and 



the corresponding utility is uf = ^-^ Irf.a}) f {.!*)■ On the other hand, the power played by 

transmitter i G /C at the SE when he is one of the game followers (F) and corresponding utility are given 

hv r)^ - '^' r(l+7*) anH 7/^ - \9^? l~(K-l)rP'~{K-2)P' r- ^ n*^ 

"y Pi ~ ISiP l-(A'-l)7*/3*-(K-2)/3* 'I"" "j — ^2 /3*(l+7*) J y'^ >■ 

III. Moving from static to repeated PC games 

A. Strategic information assumptions 

First, we want to show that, in the non-saturated regime (as defined in the preceding section), the energy- 
efficient one-shot PC game has a very interesting structure in terms of CSI. This structure, which is analyzed 
just below, has two consequences. The first consequence is that only individual CSI is required at the 
transmitters that is, only \gi\ needs to be known by transmitter i. In the existing literature ([|2||[|31||51, etc) 
this game feature is observed once the equilibrium solution has been determined. It turns out that it is 
due to the structure of the utility functions of the one-shot game and not only to the specific solutions 
analyzed in the existing literature. This can therefore be checked before deriving the equilibrium solution. 
A simple way of proving this statement is to consider a game where utilities are normalized: "''p^'^-'P^'-* = 



^i J \ s~^ I \'Z I 'Z 

^^',^^12^ — ^-^. By making the change of variables a^ = Vi\9iV the normalized utility function becomes: 

Pi \9i\ 



Uifai, ...,aK) = ^ ^^'°^ " ^ . It is seen from the normalized utilities that channel gains play only a role 

when players de-normalize their actions by computing pi = pW, which shows that only individual CSI is 
needed to play the game. The second consequence of the structure of Mj is that the repeated versions of the 
one-shot game are easier to be analyzed. Indeed, the normalized utilities Ui depend on time only through 
the action profile a = (ai, ..., a^) and not through \gi\. This means that the PC stochastic repeated game can 
be analyzed as the repeated version of a (normalized) static game; this is why, in the sequel, we will not 
use a time index for Uj. In addition to the individual CSI assumption, we also assume that every player can 
observe a public signal. More specifically, we assume that every player knows at each stage of the game 
the signal 



K 

00 

i=l 



By noticing that Vi G /C, Pilf^iP x ^g^^j^"^"^ = a; we see that each transmitter can construct the public signal 
from the sole knowledge of his action, individual channel gain, and individual SINR. One of the main 
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results of this paper is that assuming the two mentioned information assumptions allows one to implement 
a cooperation plan between the transmitters corresponding to an efficient equilibrium. This is possible with 
the repeated game formulation of the problem, which is given below. 

B. Repeated game formulation of the power control problem 

In the static PC game, each transmitter observes the channel gain associated with the current block i.e., 
\gi\ and updates his power level according to (J4]) in order to maximize his instantaneous utility. In repeated 
games, players want to maximize their averaged utility. To define the latter quantity we first need to define 
several notions namely a game stage, the game history, and the strategy of a player. Game stages correspond 
to instants at which players can choose their actions. To be concrete, in the case of DSPC, game stages 
coincide with blocks whereas in the case of DFPC, game stages coincide with sub-blocks (comprising 
one or several symbols). The proposed framework is the one of repeated games with a public signal (see 
e.g., [(22 |[|23l ') in which the public signal (namely ([5])) is a deterministic function of the played actions. 



We denote by Vt 



K 
2 2 I \ ^ „max omax 



+ E^r^^" 



4 = 1 



the interval the public signal lies in. The pair of vectors 

(^ijP ) = ('^(1)) •••, w(t — 1),Pj(1), ...,Pi{t — 1)) is called the history of the game for transmitter % at time 
t and lies in the set "Hi = i7*^^ x V\^^ . This is precisely the history /i^ = (w^jp. ) which is assumed to be 
known by the transmitters before playing at stage t. With these notations, a pure strategy of a transmitter 
in the repeated game can be defined properly. 

Definition 2 (Players' strategies in the RG): A pure strategy Ti for player i E K, is a sequence of causal 
functions (Ti,i)^>^ with 



Ti,t ■ 



(6) 

h-t ^ Pi{t). 



The strategy of player i, which is a sequence of functions, will be denoted by r^ (removing the game stage 
index in (|6]) is a common way to refer to a player's strategy). The vector of strategies r = (ri, ..., r^) will be 
referred to a joint strategy and lie in the set T. A joint strategy r induces in a natural way a unique action 
plan (p(t))t>i. To each profile of powers pit) = (pi(t), • • • ,PK{t)) corresponds a certain instantaneous utility 
Ui{p{t)) for player i. In our setup, each player does not only care about what he gets at a given stage but 
what he gets over the whole duration of the game. This is why we consider utility functions resulting from 
averaging over the instantaneous utility. More precisely, we consider two utility functions which correspond 
to the two important models of repeated games under investigation namely the FRG and DRG. 
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Definition 3 (Players' utilities in the RG): Let t = (ri, ■■■,tk) be a joint strategy. The utility for player 

i E )C is defined by: 

1 ^ 
vfir) = -J2u,{p{t)) intheFRG 



t=i 



(7) 



vHt) 



Y^ A(l - Xy-\^{p{t)) in the DRG 



t=i 



where p{t) is the power profile of the action plan induced by the joint strategy r, T > 1 is the number of 
game stages in the FRG, and Q < \ < 1 is a parameter of the DRG called the discount factor and is known 
to every player (since the game is with complete information). 

In this paper, we consider two models of RG because these are complementary in a certain sense. In 
scenarios where the duration over which the transmitters interact is known (this is typically the case when 
transmit power levels can only be updated during the training phase of the block) or/and transmitters value 
the different instantaneous utilities uniformly, the FRG seems to be more suited. In the current available 
wireless literature on the problem under investigation the DRG is used as follows: the discount factor is 
used in W2\ as a way of accounting for the delay sensitivity of the network; the discount factor is used in 
[fT4l to let the transmitters the possibility to value short-term and long-term gains differently. Interestingly, 
[El [ED offer another interpretation of this model. Indeed, the author sees the DRG as an FRG where T 
would be unknown to the players and considered as an integer- valued random variable, finite almost surely, 
whose law is known by the players. Otherwise said, A can be seen as the stopping probability at each game 
stage: the probability that the game stops at stage t is thus A(l — A)*^^ The function v^ would correspond 
to an expected utility given the law of T. This shows that the discount factor is also useful to study wireless 
games where a player enters/leaves the game. We would also like to mention that it can also model a 
heterogeneous DRG where players have different discount factors, in which case A represents min^ Aj (as 
pointed out in ^1\\ ). In practice, such a parameter can be acquired through a public signal from the receiver. 
Definition 4 (Equilibrium strategies in the RG): A joint strategy r supports an equilibrium of the RG 
defined by (/C, {7i}je/Cj {"WijiG/c) 'JVi G /C,Vr/ G %, Viij) > Vi{T-,T_j) where Vi equals vf or v^, —i is 
the standard notation to refer to the set }C\{i}; here r_j = (ri, ..., rj_i, Ti^i, ..., tk). 

An important issue is precisely to characterize the set of possible equilibrium utilities in the RG. When 
the game is with complete information and perfect monitoring, there is a theorem which provides the 
corresponding characterization from the sole knowledge of the possible utilities in the static game. This 
theorem is called the "Folk theorem" (see e.g., [[7]|[[8]|). Recall that the game is said to be with perfect 
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monitoring when every player is able to observe at each stage the actions chosen by all the other players. 
Whereas this knowledge can be acquired in certain scenarios where appropriate estimation and sensing 
mechanisms are implemented, one of our objectives is to show that some of these information assumptions 
can be relaxed by exploiting the specific structure of energy-efficient PC games. In the next section, we 
clearly make explicit the information assumptions needed to implement the proposed cooperation plan. This 
cooperation plan is built from a point of the feasible utility region of Q and has been studied in [[24| in the 
framework of centralized networks. 

IV. Equilibrium analysis of the repeated PC game 

Folk theorems aim at characterizing the set of equilibrium utilities of RG for different types of RG 
(finite/infinite/discounted RG), equilibria (NE, correlated equilibria, communication equilibria, etc), and 
information assumptions (complete/incomplete information, perfect/imperfect monitoring). In this paper, the 
objective is much more modest since we focus on a given equilibrium point and a specific game. One of the 
goals of this section is to show that the proposed equilibrium has several attractive properties for wireless 
networks: (a) as shown in the previous section, only individual CSI is required at the transmitters; (b) it 
is fair in terms of SINR like the NE in the one-shot game Q\ (c) it is PO under sufficient but reasonable 
conditions; (d) it is more efficient than the equilibrium point of the SE point of [5 J under sufficient but 
reasonable conditions; (e) it is always more efficient than the NE in the one-shot game ^; (f) it is subgame 



perfect (this notion will be explained in Sec. IV-B) in the case of the DRG. The corresponding equilibrium 



relies on a cooperation plan exploiting two points of the one-shot game: the NE point presented in Sec. 



II-B and the operating point for which a detailed study is required. 



A. An interesting operating point of the game Q 

By considering all the points (pi, ...,pi<-) such that pi E [0,Pf^^'^], i E }C, one obtains the feasible 
utility region. We consider a subset of points of this region for which the power profiles {pi, ...,Pk) verify 
Pilfl'jP = Pjldjl'^ for ^U ihj) ^ ^^- The considered subset therefore consists of the solutions of the following 
system of equations: 

(III ■ 

V(z, j) E 1C\ ^(p) = with p,\g,\^ = pM\ (8) 

It turns out that, following the lines of the proof of SE uniqueness in [5|, it is easy to show that a sufficient 
condition for ensuring both existence and uniqueness of the solution to this system of equations is that 
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10 

there exists Xq g]0, j^i such that ^yjM — ^ ('^IiL. is strictly positive on ]0,Xo[ and strictly negative on 
]xo, 7^[. This condition is satisfied for the two following efficiency functions: f{x) = (1 — e~^)^ ^ and 
/(x) = e~^ f20\ with c = 2^ — I (R is the transmission rate). If this condition should be found to be 
too restrictive it is always possible to directly derive a purely numerical condition, which can be translated 
into a condition on K, M (in the case of yj), or R fl20ll . Under the aforementioned condition, the unique 
solution of ([8]) can be checked to be: 

2 

V^e/C, p. = ^ J (9) 

l^iPl - (fsT- 1)7 

where 7 is the unique solution of x[l — {K — l)-x]f'{x) — f{x) = 0. It is important here to distinguish between 

the equal-SINR condition imposed in ([8]) and the equal-SINR solution of the one-shot game Q. In the first 

case, the SINR has a special structure imposed by the condition (8) which is SINRj = ^2:(^_i)\ 12 _ • 

Therefore, each transmitter is assumed to maximize a single- variable utility function Ui{q.) with q. = 

( — 2 — I 

22 2\ 

— , — ,..., -2^ I . In the second case (one-shot NE), the solution is the solution of a K— unknown 

i^— equation system and it happens that the solution has the equal-SINR property. The proposed operating 

point (OP), given by (|9]), is thus fair in the sense of the SINR since Vi G /C, SINR, = 7. Another question to 

be answered is whether it is efficient. To answer this question we proceed in three steps. First, we provide 

sufficient conditions under which it is PO. Second, we provide some conditions under which it Pareto- 

dominates the SE point derived in [5J. Third, we show that the proposed (OP) always Pareto-dominates the 

one-shot game NE point Q. Denote PO for Pareto-optimality. 

Proposition 5 (Sufficient conditions for PO): Let Ug be the achievable utility region for the game Q. Let 

^max jj^ ^^ upper bound for the utilities of all transmitters. Let Ug be the complementary set of Ug in 

[0,M™'^^]^. IfUg or Ug is a convex region then the vector of utilities u{p) is Pareto-optimal. 

The proof is provided in App. |Aj Whereas it appears a difficult task to fully characterize the frontier of the 

achievable utility region for an arbitrary choice of sigmoidal functions / and therefore obtain the mildest 

condition for Pareto-optimality, many simulations have shown us that with usual efficiency functions [|T| [|20l , 

the assumption "Q or Q'^ is convex" is reasonable but unfortunately not always valid. Of course, the fact 

that we do not provide a general proof does not mean that the proposed OP is not efficient under milder 

conditions. In all the simulations we have performed (not only those reported in Sec. |V]), the proposed OP 

was PO. The addressed technical problem is in fact a quite general problem encountered in game theory, 

especially in economics and some non-trivial refinements could be brought to our analysis. 
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Proposition 6 (SE vs OP): Let pf (resp. pf) the power of transmitter i at the equilibrium of the Stack- 
elberg game of ^0/ when i is the game leader (resp. one of the game followers). Denote by uf, uf the 
corresponding utilities. Then, we have: 

(i) Vi e /C, Ui{p) > uf; 

(ii) 3Ko, VK > Ko, Vi e /C, u,{p) > uf. 
The first statement of this proposition indicates that a transmitter always prefers to play the OP than being 
a leader of the hierarchical game of [SJ. The second statement shows that this is also true for the followers 
when the network reaches a certain size in terms of users. All the simulations we have performed have 
shown that Kq equals 3. 

Proposition 7 (NE vs OP): It is always true that Wi G }C,Ui{p) > Ui{p*) where p* = {pi, ...,p*j^). 
An important message conveyed by this proposition is that every transmitter always prefers the optimal 
strategy of the RG than the optimal strategy of the one-shot game. In the next section, we will see that 
the proposed OP corresponds to a possible agreement between the players, which allows one to obtain an 
equilibrium point in the FRG and DRG. Note that other points of Q which are both feasible and individually 
rational could be used to build a cooperation plan. Designing other cooperation plans based on the repeated 
game formulation is a relevant extension of this paper. 



B. Equilibrium strategies of the repeated PC games 

First, we state two theorems providing equilibrium strategies that have the properties mentioned in Sec. 



IV-A These theorems correspond to the FRG and DRG respectively. 

Theorem 8 (Equilibrium strategies in the FRG): Let Tq be some integer Assume that the following con- 
dition is met: T >Tq with: 

min/(7)[l-(A^-l)7] 



Tn 



„max /(/3*) 
'Ii /3* 



Vi 



Vi 



13* 



C/(/^*) 



-'jlti j 



(10) 



Then, for all i E K,, the following action plan is an NE of the T— stage FRG for any distribution for the 



channel gains and any (T, Tq) verifying (10) \/t > 1,: 

p, if te {1,2 



Ti 



i,t 



p* ifte{T-To + l,...,T} 
pmax if someone deviates from 

the above cooperative plan. 



(11) 
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Theorem 9 (Equilibrium strategies in the DRG): Assume that the following condition is met: 

r/™'M(/3*,7) 
^ - ^^"^{(3*,^) + C"- [(A- - l)/(/3*) - 5(/3*,7)] ^^^^ 

where 5{(3*,^) = ~ -~ /(t) ~ ^7 fil^*)- Then, for all i E IC, the following action plan is a 

subgame perfect NE of the DRG for any distribution for the channel gains: 



Vt > 1, r. 



) ' i,t 



S if all, keener players rlayp^. ^^^^ 

p* Otherwise 



For the proofs see App. |D} Here, we restrict our attention to interpreting these theorems. 
n Comment 1 (Equilibrium conditions). Before starting the game, the players agree on a certain coop- 
eration/punishment plan. Each transmitter i E IC always transmits at pi if no deviation is detected and 
plays a punishment level otherwise (P™^^ or p* depending on the RG under consideration). The proposed 
strategies support an equilibrium if the gain brought by deviating is less than the expected loss induced by 
the punishment procedure applied by the other transmitters. To make sure that this effectively occurs, the 
game must be sufficiently long in the ERG and the game stopping probability sufficiently low in the DRG. 
This explains the presence of the lower bound on T and upper bound on A. 
n Comment 2 (Cooperation plan). In the DRG, the cooperation plan consists in always transmitting at 



the powers corresponding to the OP analyzed in Sec. IV- A In the ERG, the cooperation plan includes a 
phase where the transmitters play the one-shot game NE, which can seem surprising. The game having a 
finite number of stages, it appears that the players who deviate at the last stage of the game cannot be 
punished. If a player deviates earlier it can happen that the punishment undergone by the deviator is not 
sufficiently severe. Therefore, the agreement consisting in playing the one-shot NE during the second phase 
of the cooperation plan corresponds to a selfish trade-off between the gain brought by deviating without 
being punished severely enough and the one brought by playing at an efficient point (namely the OP, which 
Pareto-dominates the one-shot NE). 

n Comment 3 (Deviation detection mechanism). The cooperation plan is implementable only if the trans- 
mitters can detect a deviation from the cooperation plan. It turns out that the knowledge of the public signal 
is sufficient for this purpose. Indeed, when the transmitters play at the OP, the public signal equals jz^zijy- 
Therefore, if one transmitter deviates from the OP, the public signal co{t) is no longer constant. Of course, 
if more transmitters deviate from the equilibrium in a coordinated manner this detection mechanism can 
fail; this is not inherent to the proposed cooperation plan but to the NE definition. If this issue should turn 
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out to be crucial it would be necessary to consider other solution concepts such as strong equilibria which 
are stable to deviations of coalitions [|25l[|26ll [l8l but this is out of scope of this paper. 
n Comment 4 (Punishment procedure). There is one difference between Theorems [8] and [9] in terms of 
punishment. In the FRG, the other transmitters punish the deviator by playing at their maximum transmit 
power, which is the most severe punishment possible. In the DRG, the punishment is that the other 
transmitters play at the one-shot game NE. The drawback for punishing at the one-shot game NE point 
is that the punishment mechanism is less efficient which is to say, more game stages are needed to punish 
the deviator. The advantage is that the proposed equilibrium strategy is subgame perfect [|27|. The subgame 
perfection property ensures that if a joint strategy r of the RG supports an equilibrium then, after all possible 
histories h, the joint strategy r(/i) is still an equilibrium strategy, which makes the decentralized network 
performance predictable. Back to the FRG, it can be proven [10] that the subgame perfect equilibrium 
property cannot be verified in this RG because the associated one-shot game has only one pure NE (whereas 
at least two pure NE are needed). 

n Comment 5 (Equilibrium conditions and wireless channels). The equilibrium conditions provided can 
be seen to be independent of the channel statistics. If the latter are known, it is possible to refine the 
bounds (see App.[D]for more details). To the best of the authors' knowledge, this type of conditions are not 
considered in the available wireless literature whereas it is important to know whether the models of RG are 
applicable. The problem of how much these conditions are restrictive can be seen from two complementary 
perspectives. If the channel gain dynamics is given, the question is to know the maximum (resp. minimum) 
value of the discount factor (resp. game stages) guaranteeing the existence of an equilibrium condition. The 
values for 77™™ and 'qf^'^ depend on the propagation scenario and considered technology. In systems like 
WiFi networks these quantities typically correspond to the path loss dynamics, the receiver sensitivity, the 
minimum distance between the transmitter and receiver. On the other hand, if the discount factor (resp. the 
number of game stages) is given (e.g., by the traffic statistics or training sequence length), this imposes 
lower and upper bounds on the channel gains. It can happen that the admissible range for the discount factor 
(or the number of game stages) can be not compatible with the channel gain dynamics, in which case our 
model need to be refined. 
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V. Numerical results 

In this section we consider the same type of scenarios as [|3l[|5l namely random CDMA systems with a 
spreading factor equal to N, the efficiency function f(x) = (1 — e~^)*^ and Rayleigh fading channels. 
ic The first scenario considered is simple but has the advantage that it can be represented clearly. Consider 
the scenario (K,M,N) = (2,2,2). Fig. [T| represents the normalized achievable utility region of the one- 
shot PC game (normalizing the utilities allows one to conduct fair comparisons). Four important points are 
highlighted: the NE of the one- shot game (circle), the SE (star), the proposed operating/cooperation point 



studied in Sec. IV-A (square marker), and the point where the social welfare (sum of utilities) is maximized 
(cross). From this figure it can be seen that: the utility region is convex; a significant gain can be obtained 
by using a model of repeated games instead of the one-shot model; and the cooperation and optimum social 
points coincide. 

• Considering the same scenario, the link between the number of stages of the FRG (resp. the stopping 
probability of the DRG) and channel gain dynamics has been considered. Fig. |2] (resp. [3]) represents the 
quantity 10 logio ( ^^snr ) as a function of T (resp. of A) for M = 2 and different numbers of transmitters and 
spreading factors : {K,N) E {(2,2), (4,5), (10, 12)}. Considering the corresponding figures, the models of 
RG seem to be suitable not only in scenarios where \gi\ models the path loss effects but also the fading 
effects. Of course if the number of stages is too small or the probability too high, more appropriate models 
have to be designed. 

Tir As a third type of numerical results, the performance gain brought by the DRG formulation of the 
distributed PC problem is assessed. Denote by wne (resp. wse and wdrg) the efficiency of the NE (resp. 
SE and RG equilibrium) in terms of social welfare i.e., the sum of utilities of the players. Fig. |4] represents 
the quantity ^org-wne ^j^^j wse-wme jj^ percentage as a function of the spectral efficiency a = ^ with 
A^ = 128 and 2 < K < ^ + 1. The asymptotes a^nax — ^ ~^ ~n ^^^ indicated in dotted lines for different 
values M G {10, 100}. The improvement become very significant when the system load is close to ;^ + ^, 
this is because the power at the one-shot game NE becomes large when the system becomes more and more 
loaded. As explained in dSJ for the Stackelberg approach, these gains are in fact limited by the maximum 
transmit power. 

i< At last. Fig. [5| represents the ratio ^^^^ as a function of the number of stages played (averaged over channel 
gains). Considering the following constants {K, M, A^) = (35, 10, 128), P™"^ = 10"^ Watt, a^ = 10"^ Watt, 



101ogiQ(^^^^^^) = 20 and the equation (10), a cooperation plan can be settled as soon as To = 2852 stages. 
This curve gives an idea of what a transmitter can gain by cooperating, the normalized gain in terms of 
utility goes from 1 to 6 depending on the number of stages of the game (between 2852 and 15000 stages). 
For example, in a cellular system where power levels are updated with a typical frequency of 1500 Hz, this 
would mean that cooperating is a good option if several transmitters are using the same resources for more 
than 1 s. The ratio "'^^'^ has a limit when T — )■ +oo. The latter is easy to obtain since 



wne 

K T-Tq 



l2 



WFRG _ i=l i=l i=l t=T-To + l 

'^NE JL JL 

'^(/^*)E^'Eiff'Wi 
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where (f)(x) = ^^^ \l — (K — l)x]. It follows that, for a given To, lim = , ,^ , . 

T^+oo WnE nP*) 

VI. Conclusion 

One of the messages of this paper is that taking into account the fact that transmitters interact several 
times, it is possible to incite selfish transmitters to operate at lower powers, which leads to an equilibrium 
point that Pareto-dominates the one- shot NE point of ^ and the SE point of O. It has been proven that 
the proposed equilibrium strategies only require individual CSI and a public signal, which is available in 
many wireless systems. Additionally, this equilibrium is also fair in terms of SINR similarly to the one- 
shot game NE. In terms of modeling, two models of RG have been analyzed: the finitely RG which is 
suited to situations where the number of game stages is known, whereas the discounted RG is more suited 
to situations where it is uncertain or when typical features of wireless networks (delay sensitivity of the 
network, the fact that users can enter/leave the system, or the fact that transmitters can value the current 
and future utilities differently) have to be accounted for. An apparent drawback of these two models is the 
existence of an admissible range for the channel gain dynamics. When only path loss is considered, the 
corresponding effect is generally negligible. If fading is also considered, simulations have shown that the 
corresponding impact seems to be limited if the game stopping probability (resp. number of game stages) 
is reasonably low (resp. high). Otherwise, the proposed model probably needs some refinements such as 
those proposed in ||28l where the author studies the influence of the value of the discount factor on the set 
of possible equilibrium utilities for the prisoners' dilemma. As a more general extension of this paper it 
would be important to apply the proposed approach to other network types such as the interference channel 
and studying other equilibrium points (depending on the fairness criterion under consideration). At last but 
not least, the proposed communication and game models, even through there are commonly used, should 
be refined to propose cooperation plans more robust to imperfect modeling and inherent uncertainty on the 
quantities used. 

Appendix A 
Proof of Proposition [5] 

Assume that Ug is convex. Let m be a point of Ug. Define the hyperplane 1-L{u), orthogonal to the vector 
and containing u by: 1-L{u) = < {u[, ...,u'j^) G M.^ : J2i=i iS?^ ^ ^ f • ^^^ ^^^ observation 



1 1 

l9iP'---' ISifl _ 

K 



to be made is that the point u = {ui{p), ...,uk{p)) given by (M) maximizes the weighted sum X]i=i^i^i 
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if Wi = pw • Note that u is unique since the solution of x [1 — {K — l)x] f'{x) — f{x) = is unique 
p = a.Tg'maXpJ2i=i T~w- ^y definition, u maximizes the Euclidian distance d{u,0) for all points on the 
line originating from and directed by the vector ( p^, ..., y^ ), orthogonal to the hyperplane T-Liu). In 
conclusion, u maximizes the sum X]j=i pj^' ^^'^ because of the convexity of the utility region, no other 
achievable point can dominate the hyperplane 'H(m), which shows that u is PO. 

Now assume that tig is convex. Denote by m™*^"^ the maximal utility for player i. Define a set of K points 
iui)i£K, such that: Mj = (0, . . . , u^^"^ + 1, . . . , 0). Let P be a polyhedron defined as the subset of Rf that 
dominates every hyperplane passing through the OP u and a combination of K — I points Ui. As Ug is 
convex, this polyhedron intersects the achievable utility region in a unique point which is the OP u. Since 
the positive orthan at the OP is strictly included in the polyhedron, the OP is therefore PO. 

Appendix B 
Proof of Proposition [6] 



Statement (i). From Sec. |II-C| we have that 



7 



7 



Since by definition /3* > 7* we readily see that ^ > I. 
Statement (ii). From Sec. |n-C| we have that 



^[i-jK-imi+r) 

< ^[1-{K- l)7*/3* -{K- 2)/3*] 



/(/3*) 



{1 - P*[iK - l)r + K - 2]} 



(15) 



- i^(p'^{K-i)rP'+i-(K-i)p*) ^ 



(17) 



if{l-^[{K-l)Y + K-2]} 
^ f(f^*\ r. TTTT: ITT- (1^) 



We want to prove that this ratio is greater than or equal to 1. We consider the quantity : 

^fe./3*]W =^{l-7K[(J^-l)7l' + ^-2]} (19) 

= K{j{n - fi^Km^K + 1) + ^ (20) 

-^ -(/(/?*) -/(7k))(7a- + 2) 

> Kifin - f{^K)){iK + 1) + ^ (21) 

-^-(/(/3*)-/(7k))(/3*+2) (22) 
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As 7x goes to zero as K goes to infinity, ('^k < 7^) ^^'^ by hypothesis /(O) = 0, lim^_s.o ^^ = : 

limK^ + oo'y5[T.^,/3'](A') (23) 

> limx^+co i^[(/(/3*) - f{lK)){iK + 1)] (24) 

+ ^ - ^ - (/(/?*) - /(7k))(/3* + 2) 
> limK-.+oo ^(/(/3*)) - ^ - (/(/3*))(/3* + 2) = +c» (25) 

In conclusion, the sequence (v^[7K,/3*](-^))ft:>2 is strictly increasing and its limit is +00. There exists an 
integer Kq such that for all K > Kq, v9[;^^_^.](_ft') > 0. This implies that Ui{p) > uf. 

Appendix C 
Proof of Proposition [7] 

The goal is to show that Vz G K, "'.-^.^ = -,74^ is greater than or equal to one. For this, consider 

the function 0(x) = ^-^ [1 — {K — l)x]. The derivative of is (f)'{x) = '*- ^~ '^^■' y^>~jy^) which vanishes 
in unique point namely, in 7. Therefore the function is strictly increasing on ]0,7[ and strictly decreasing 
on J7, +oo[ and thus reaches its maximum in 7, which concludes the proof. 

Appendix D 
Proof of Theorems [8] and [9] 

The proofs are provided in the general case where the PC game is repeated for different channel realizations 
(DSPC). At a game stage t a transmitter has therefore to consider future realizations of the channels, which 
are unknown at stage t. To tackle this issue we use a dynamic programming principle [|2T|| . which is standard 
in repeated game. Let us define Ui = m.aXpUi{p) = ^ ^^i/i , the maximal utility player i can get for a 
fixed channel gain \gi\'^. First consider the FRG. It is clear that during the last To stage, the players play 
the one-shot NE. Thus no deviation over this period can be profitable to any player. We now consider 
that a player deviates during the first T — Tq stages. His deviation utility is bounded by Ui and he will 
be punished at his minmax level in the next stage. Suppose the deviator deviates at stage t < T — Tq. 
Denote by Ui= minp maXp^Ui{p) with p_. = {pi, ...,pi_i,pij^i, ...,pk). The deviation utility of the FRG 
is upper-bounded as: 

< Er="i^"~' ««(P(S)) + MPiT - To)) + ELt-To+1 EgK (p(s))] 

The equilibrium condition for the FRG at stage t writes as: 

EL"i''°"' Mp{s)) + Mp{T - To)) + ELt-t„+i Eg[^. {p{s))] 

< ELT""' Mp{s)) + MPiT - To)) + ELt-To+1 EgK(p(s))] 
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< Mpit)) + ELt-to+i^^KW^))] 



<lff. 



|2 /(7)(l-(Jf-l)7) 



E2 



Igd'/C/?*) 



=^-^«+i^^/3*fc«\.pr''i«^i'+-' 



+ Es=T-Tn + llEg[|5i 



2l /(/3'-)(l-(Jy-l)/3-') 
J /3- 



Now we want to show that the last inequality is verified under the sufficient condition of Theorem 8. The 
sufficient condition of Theorem 8 implies that: 



„max /(;3-') _ rnin f(^)(l-(K-l)=,) 
'li j3* 'li ^ 



<To 



„min J(;3*)(l-(-K-l)/3*) 



vT^'fiP') 



^--/m + To 



'7'"°"/(/3*) 



< min /(7)(l-(Jf-l)7) ^_ jn min /(/3* Xl-y-lj/S* ) 



:/(/3*) 



max JU3_J I Y^J 






< „min /(7)(l-(Jf-l)7) , V^ r,™'" /(/a* )(1- (K-lj/S* ) 



The worst case scenario for stochastic channel gains implies the average case scenario. The condition of 
theorem [8] is sufficient for the desired equilibrium condition hold at each stage t of the FRG. This concludes 
the proof for the FRG. Consider now the equilibrium condition for the DRG at stage t : 

,2/(/3*) 



M9i\ 



13* 



s>t + l 

a /(7)(l-(J^-l)7) 

S ^|<7i| z 



p* 



+ EA(i-Ar%[|..i-] -^^^^^^-l^-^^^^ 



s>t + l 



A|5.| 



/(r) /(7)(l-(^-l)7) 



< ^ A(l~Ar'Eg[|3,|^] 

s>t + l 



/?• 
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The equilibrium condition for the DRG can be obtained by following the same reasoning as for the FRG. 
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Fig. 1. Normalized achievable utility region for {K, M, N) — (2, 2, 2) plus four important points. 
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Number of stages of the game : T 



Fig. 2. Admissible channel gain dynamics vs. number of stages of the game for M — 2 and {K, N) £ {(2, 2), (4, 5), (10, 12)}. 
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(M,K,N) = (2,10,12) 
(M,K,N) = (2,4,5) 
(M,K,N) = (2,2,2) 
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Fig. 3. Admissible channel gain dynamics vs. discount factor for M — 2 and {K, N) G {(2, 2), (4, 5), (10, 12)}. 
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Fig. 4. Percentage of improvement of social utility for repeated equilibrium (wdrg) and for Stackelberg (wse) vs. Nash equilibrium (wne) 
in function of the system load (K/N). 
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Fig. 5. Improvement of social utility for the finitely repeated equilibrium (wfrg) vs. Nash equilibrium (wne) in function of the number of 
stages T of the game. 
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